What you really want is to hand the character processing off to a background routine so your mainline code can execute as quickly as possible. One very effective way to do this is to implement a ring buffer.
Imagine an array of 16 characters and two indexes:
#define TX_BUFF_SZ 16
char TxBuffer[TX_BUFF_SZ];
int InsertIDX = 0;
int ExtractIDX = 0;
When we want to write data into the buffer, we use the InsertIDX to write into the next available slot, and then increment the index;
TxBuffer[InsertIDX++] = write_ch;
And when InsertIDX gets to the end of the buffer, we wrap it back to zero as follows:
If (InsertIDX >= TX_BUFF_SZ)
InsertIDX = 0;
This code works in the generic case, but if you initially set your buffer size to a power of 2, you can get a little fancy by logically AND'ing InsertIDX to mask off the high-order bits.
InsertIDX &= (TX_BUFF_SZ - 1);
Putting it all together, we can create a generic ring-buffer write function that also enables a transmitter empty interrupt as follows:
void rb_putchar(char ch)
{
TxBuffer[InsertIDX++] = ch;
InsertIDX &= (TX_BUFF_SZ - 1);
TxEmptyInterruptEnable = 1; //Enable the TxEmpty interrupt
}
So long as TxBuffer is at least as large as the largest amount of data we might want to send out in a single printf() call, the above code will allow us to buffer the data and return control back to the calling routine as quickly as possible - i.e. it is no longer 'blocking'.
Now let's deal with feeding each character one-by-one to the USART. This would be done inside an TxEmpty interrupt service routine (ISR). Observe that when ExtractIDX catches up to InsertIDX there is no more data in the ring buffer to send out so we need to disable further TxEmpty interrupts:
void ISR_TxEmpty(void)
{
UartTX = TxBuffer[ExtractIDX++];
ExtractIDX &=(TX_BUFF_SZ - 1);
if (ExtractIDX == InsertIDX)
TxEmptyInterruptEnable = 0; //Disable further TX empty interrupts
}
Caveats
To help keep the focus on the core concepts, I've kept the code above very simply, but in practice, you'll need to qualify your declarations of InsertIDX and ExtractIDX as volatile so that the compiler recognizes these variables can be updated outside of the normal function execution path.